Skip to content

MIR rewrite#9299

Open
rtfeldman wants to merge 1289 commits into
mainfrom
codex/remove-lir-for-loop
Open

MIR rewrite#9299
rtfeldman wants to merge 1289 commits into
mainfrom
codex/remove-lir-for-loop

Conversation

@rtfeldman
Copy link
Copy Markdown
Contributor

@rtfeldman rtfeldman commented Mar 27, 2026

Wouldn't be a proper rewrite without a rewrite-rewrite.

Fixes #9388
Fixes #9389
Fixes #9390
Fixes #9391
Fixes #9392
Fixes #9393
Fixes #9395
Fixes #9396

Comment thread src/eval/cir_to_lir.zig Outdated
Comment on lines -23 to -32
/// Comptime-gated tracing for the shared lowering pipeline.
const trace = struct {
const enabled = if (@hasDecl(build_options, "trace_eval")) build_options.trace_eval else false;

fn log(comptime fmt: []const u8, args: anytype) void {
if (comptime enabled) {
std.debug.print("[lower] " ++ fmt ++ "\n", args);
}
}
};
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found this tracing helpful for tracking down bugs, do we want to remove it?

@rtfeldman rtfeldman force-pushed the codex/remove-lir-for-loop branch 2 times, most recently from 43b6417 to e5bb427 Compare April 19, 2026 01:12
@rtfeldman rtfeldman changed the title Remove dedicated LIR for_loop node MIR rewrite May 13, 2026
@rtfeldman rtfeldman marked this pull request as ready for review May 13, 2026 01:14
rtfeldman and others added 19 commits May 12, 2026 22:00
The semantic audit and post-check architecture scripts are perl, which
isn't preinstalled on Windows. Print a skip notice instead of failing;
Linux/macOS CI still enforces these gates.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ValueStorage(...) is a generic factory that is never instantiated.
Only its ValueLoc and NumKind enums are consumed, and the per-arch
backends in x86_64/CodeGen.zig and aarch64/CodeGen.zig duplicate
the same bitmask-freelist allocator logic directly. Document the
two paths forward (delete the factory and move enums next to
consumers, vs. finish wiring both backends through this layer) so
the next person touching it knows it is stalled scaffolding.

The i128 multiply register-clobber fix that originally accompanied
this TODO landed independently upstream as 81bdb69 and was
absorbed by the rebase.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Windows previously fell through to runSequential because fork() doesn't
exist. Add a parallel executor that re-invokes the runner binary with
`--worker <idx>` for each test, reads the serialized result from stdout,
and runs N such children concurrently. JobObject with KILL_ON_JOB_CLOSE
ensures workers tear down with the parent.

The eval runner's runSingleTest is reused as-is in worker mode. On
Phase-1 failures, the runner re-spawns the failing test once per
implemented backend with `--worker-backend <name>` so it can attribute
which backend crashed (POSIX gets this for free via forkAndEval).

A new `onTestStarted` callback moves the per-test "RUN" log line into
the parent so it stays coherent across N workers. POSIX uses the same
callback from launchChild — strictly an improvement over the previous
print-from-forked-child which could interleave.

Default workers on Windows capped at min(cpu_count, 4) because each
worker reserves ~2TB of virtual address space for the LIR shared-memory
image; more concurrent workers trip MapViewOfFileFailed. Users can
override with --threads N.

Verified on Windows with the full eval suite: 1200/1201 pass in 75s
using 4 processes (vs 124s sequential on inspect: subset). The one
failure is a transient MapViewOfFileFailed in Phase 1 where Phase-2
attribution correctly shows every backend passing in isolation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Each parallel worker reserved 2 TB of virtual address space for the LIR
runtime image. That reservation was the single biggest contributor to
per-test wall-clock — instrumentation showed ~60 ms of runSingleTest's
~195 ms going into CreateFileMapping + MapViewOfFile alone, and the
system-wide accounting trips MapViewOfFileFailed when N parallel
workers each reserve 2 TB simultaneously, which was why we capped
default workers at 4.

Drop the Windows reservation to 256 MB (large enough for any eval test;
the runtime image actually used is a few MB). Remove the 4-worker cap
so default parallelism follows cpu_count.

Also adds a ROC_EVAL_TIME_WORKER env var that dumps per-phase
timestamps from inside a worker — used to find the 2 TB reservation as
the bottleneck. Costs nothing when unset.

Full suite (1201 tests) on Windows:
  Original parallelization (4 workers, 2 TB):   75 s, 1200/1201 pass
  After reservation shrink (4 workers, 256 MB): 48 s, 1201/1201 pass
  After lifting cap (16 workers, 256 MB):       27 s, 1201/1201 pass

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the one-Child-per-test executor with a persistent worker pool:
each Child spawns once in --worker-stream mode and processes many tests
over a stdin/stdout protocol. Parent sends "<idx>\n" per test, child
writes a u32 length + serialized result to stdout. Child exits cleanly
on stdin EOF.

This amortizes process boot (CreateProcess + main + collectTests +
filter + builtin loading) across all tests assigned to a worker. The
one-shot --worker <N> entry point is kept for the runner's Phase-2
retry path, which needs to vary --worker-backend per spawn and benefits
less from amortization.

Full eval suite on Windows (cpu_count=16, 256 MB shm reservation):

  one-shot, --threads 4    83s
  persistent, --threads 4  74s   -11%
  one-shot, --threads 8    55s
  persistent, --threads 8  45s   -18%
  one-shot, --threads 16   44s
  persistent, --threads 16 35s   -20%

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Parallelize eval tests on Windows
Zig installs its default segfault handler via AddVectoredExceptionHandler,
which on Windows runs before SetUnhandledExceptionFilter and short-circuits
our handler in src/base/stack_overflow.zig. The previously failing
stack_overflow test exited with code 5 ("Stack Overflow" stderr from Zig's
handler followed by a secondary access violation while dumping the stack)
instead of our intended 134 with the helpful message.

Disable enable_segfault_handler in the test helper unconditionally and in
the roc CLI on Windows only, preserving Zig's dev-friendly POSIX traces.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The parallel eval test runner printed "RUN <name>" for every test as it
started, which is noisy under the default 16-worker fan-out. Move it
behind --verbose so it only fires when the user opts in.

`onTestStarted` is a comptime Pool callback and can't take a closure, so
the flag is stored in a module-level `verbose_logging` populated from
`cli.verbose` in `main` after arg parsing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The test harness's persistent-worker pool unconditionally appends
`--worker-stream` to every child it spawns (src/build/test_harness.zig
runChildPool). The eval runner already handled the flag, but
parallel_cli_runner did not: workers parsed the flag, ignored it, and
fell through to the parent path, where they reentrantly spawned their
own pool of N child workers — each launched with `--worker-stream`
again. The resulting fork-bomb made every worker run another `roc
build`, and `roc build` reserves a 2 TB SEC_RESERVE on Windows
(src/cli/main.zig SHARED_MEMORY_SIZE). The cascading reservations
exhausted the system commit limit, which is why sibling test suites
running in parallel (roc_subcommands_test, glue_test) failed with
CreateProcessW => COMMITMENT_LIMIT / SystemResources and OutOfMemory.

Add the streaming worker branch: read decimal test indices from stdin
(one per line), run each test, write a u32-length-prefixed result to
stdout, loop until EOF. Mirrors src/eval/test/parallel_runner.zig.

Before: test-platforms reported 0 passed / 91 crashed / 16 hung in 62s
on Windows, and test-subcommands / test-glue failed with
COMMITMENT_LIMIT and OutOfMemory.
After: test-platforms is 91 passed / 16 failed in 6.5s (the 16 are
real test errors that were masked by the fork-bomb), test-subcommands
and test-glue pass cleanly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment